Here's some useful tips on using the R console in RStudio.
2 + 2 5 * 4 2^3
2 + 3 * 4/(5 + 3) * 15/2^2 + 3 * 4^2
<-" is known as the assignment operator in R3 to the variable name x:x <- 3
> x <- 3 > x [1] 3
> y Error: object 'y' not found
You can name your variables anything you want, but there are a few rules:
v.one and v_one are valid names but v one is not (because it includes a space).More information on general R programming style can be found here.
Objects in R can be a number of different types. Next, we'll discuss the three types you are most likely to encounter.
'j', 'hello', 'treatment A'1, 550, 3.14TRUE and FALSE and are often used to control programming flowVectors are R's most basic data structure. When we created the variable x in the previous section, we had actually created a vector of length 1. The elements contained in a vector must be of the same type (see prev. section). Vectors including more than one element are frequently constructed using the c() (concatenate) function:
a <- c(1, 2, 5.3, 6, -2, 4) # numeric vector
b <- c("one", "two", "three") # character vector
c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE) # logical vector
You can access values inside of a vector by subsetting it with the [] operators. Examples:
> a[1] # access the first element of a [1] 1 > a[3] # access the fourth element of a [1] 5.3 > b[length(b)] # acess the last element of b [1] "three"
Matrices and arrays are vectors with dimensions. Since they are vectors, they can only contain elements of the same type. Matrices have 2 dimensions (rows & columns) and are created with the matrix function:
> matrix(1:25, ncol = 5, nrow = 5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
We can subset arrays in matrices in the same way we did vectors, but we must be conscious of their dimensionality. For a matrix, m, we would access the data stored in row 2 and column 5 by typing m[2, 5].
data.frame() function> data.frame(group = rep(c('trt', 'ctrl'), each=4),
response = rnorm(8))
group response
1 trt 0.01231700
2 trt 0.55028627
3 trt -0.24411263
4 trt 0.35996580
5 trt 0.89960231
6 ctrl -0.76686861
7 ctrl 0.03573818
8 ctrl 1.69159332
> person <- list(first = 'andrew', last = 'borgman', age = 25) > person $first [1] "andrew" $last [1] "borgman" $age [1] 25
mean function is used to find the average of a vector of numbers:> my_data <- c(10, 22, 44, 55, 14, 66) > mean(my_data) [1] 35.16667
help() and ? functions can be used to search the documentation for functions matching the inputted textmean?mean help(mean)
> ?stdev No documentation for ‘STDEV’ in specified packages and libraries: you could try ‘??STDEV’
??.??"standard deviation"
Entering this, we see the same kind of search results returned in the RStudio help pane.
The R workspace can be thought of as a container holding all of the objects you've created duing your R session. You can print a list of all of the objects in your current workspace using the ls() function. If we start a new R session, our workspace will be empty:
> ls() character(0)
And we'll be able to see some objects if we add them:
> x <- 20 > y <- 30 > z <- x + y > df <- data.frame(nums = 1:10, b = letters[1:10]) > ls() [1] "df" "x" "y" "z"
save.image: Lets you save a snapshot of your entire workspace to a *.RData file.
save.image('my-work.RData')save: Lets you save a snapshot of a few specified objects to a *.RData file.
df the data.frame we created in the last section): save(df, file = 'my-dataframe.RData')load: Lets you load your saved *.RData files back in to R ot continue working on them.
df the data.frame we created in the last section): load('my-dataframe.RData')rm() function to remove objects andrm() command allows you to selectively remove objects from your R session when they are no longer needed> ls() # start in an empty workspace character(0) > y <- 1 > z <- 1 > ls() # can see the two objects we created [1] "y" "z" > rm(y) # remove y > ls() [1] "z"
You can remove all objects in your workspace by using ls() to generate a vector of all the objects that have been created, and passing that to the rm() function:
rm(list = ls())
You can also use a button in RStudio's Environment panel to remove all of the objects in your workspace. RStudio will prompt you asking if you are sure you want to go through with deleting all objects, choosing Yes will permanently delete all objects in the workspace.
Most generic R packages are hosted on the Comprehensive R Archive Network CRAN. To install one of these packages, you would use install.packages("packagename"). You only need to install a package once, then load it each time using library(packagename). Here's how one would install and load the ggplot2 package.
# Install only once.
install.packages("ggplot2")
# Load the package every time you want to use it.
library(ggplot2)
Now that we're feeling a bit more comfortable with the R environment, we'll explore how we can import our own experimental data into R.
Before going any further, please download the zip file below and extract its contents somewhere on your computer.
The easiest way to store data for import into R is in text files. R has facilities for importing data directly from Excel spreadsheets or from some data base format, but such performing such operations is outside the scope of this course.
Flat text files containing data typically have some type of special character (delimiter) – think: tab, semicolon, space – for separating different columns of data. Data stored in text files most commonly falls into the following formats.
.txt or .tsv.ID Group Gene1 Gene2 Gene3 Gene4 Sample 1 Group 1 9.695 4.694 3.733 4.874 Sample 2 Group 1 8.087 3.276 2.220 3.095 Sample 3 Group 1 9.885 6.297 5.842 8.233 Sample 4 Group 1 7.832 -1.286 2.594 -1.089 Sample 5 Group 1 10.239 4.474 3.300 3.377
File -> Save As -> Tab-Delimited Text
.csv."ID","Group","Gene1","Gene2","Gene3","Gene4" "Sample 1","Group 1",9.695,4.694,3.733,4.874 "Sample 2","Group 1",8.087,3.276,2.220,3.095 "Sample 3","Group 1",9.885,6.297,5.842,8.233 "Sample 4","Group 1",7.832,-1.286,2.594,-1.089 "Sample 5","Group 1",10.239,4.474,3.300,3.377
File -> Save As -> Comma Separated Values
read.delim function to import our .tsv data frameread.delim is a special alternative to R's more general read.table function (?read.table for details)read.csv is another special alternative to read.tableread.csv & read.delim have> gene.exprs.long[1:5, ] # print the first 5 rows
ID Group Gene Expression
1 Sample 1 Group 1 Gene1 9.695228
2 Sample 2 Group 1 Gene1 8.087463
3 Sample 3 Group 1 Gene1 9.885696
4 Sample 4 Group 1 Gene1 7.832890
5 Sample 5 Group 1 Gene1 10.239599
> gene.exprs.long[1:5, 1:2] # print the first 5 rows and columns 1 & 2
ID Group
1 Sample 1 Group 1
2 Sample 2 Group 1
3 Sample 3 Group 1
4 Sample 4 Group 1
5 Sample 5 Group 1
subset(dataset, logical conditions)> subset(gene.exprs.long, ID == 'Sample 1') # All Sample 1 measures
ID Group Gene Expression
1 Sample 1 Group 1 Gene1 9.695228
51 Sample 1 Group 1 Gene2 4.694323
101 Sample 1 Group 1 Gene3 3.733354
151 Sample 1 Group 1 Gene4 4.874305
> subset(gene.exprs.long, Expression > 10) # gene expression > 10
ID Group Gene Expression
5 Sample 5 Group 1 Gene1 10.23960
6 Sample 6 Group 1 Gene1 11.01402
10 Sample 10 Group 1 Gene1 10.10460
13 Sample 13 Group 2 Gene1 10.21796
16 Sample 16 Group 2 Gene1 10.03342
46 Sample 46 Group 5 Gene1 10.11114
There are three main graphing frameworks available in R for creating high quality plots:
I would suggest learning ggplot2
ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics (taken from ggplot2 site). ggplot2 allows intro R users to create high quality data visualizations with little effort using an intuitive plotting syntax.